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The tremendous pandemic potential of coronaviruses was demonstrated twice in 
the last 15 years by two global outbreaks of deadly pneumonia. Entry of 
coronaviruses into cells is mediated by the transmembrane spike glycoprotein S, 
which forms a trimer carrying receptor-binding and membrane fusion functions. 
Despite their biomedical importance, coronavirus S glycoproteins have proven 
difficult targets for structural characterization, precluding high-resolution studies 
of the biologically relevant trimer. Recent technological developments in single 
particle cryo-electron microscopy allowed us to determine the first structure of a 
coronavirus S glycoprotein trimer which provided a framework to understand the 
mechanisms of viral entry and suggested potential inhibition strategies for this 
family of viruses. Here, we describe the key factors that enabled this 
breakthrough. 

Keywords: Coronavirus spike protein, cryo-electron microscopy, rational vaccine design, 
Rosetta, Relion 

Broad Audience Statement: The recent emergence of highly pathogenic coronaviruses 
and the potential for future outbreaks have urged the need for a vaccine. Using cryo- 
electron microscopy, the first structure of the key antigenic, infection-mediating protein 
has been solved. This structure will assist rational vaccine design and development of 
strategies to combat this family of viruses. 
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Coronaviruses are enveloped viruses with large positive-sense RNA genome. In 
humans, coronaviruses are responsible for up to 30% of respiratory tract infections 
including mild upper respiratory tract infections (common cold), croup, bronchiolitis and 
pneumonia. 1 In addition, coronaviruses have fostered a lot of attention in the last 15 
years due to the emergence of deadly viruses with tremendous pandemic potential: 
severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle-East 
respiratory syndrome coronavirus (MERS-CoV). 1,2 After its first occurrence, SARS-CoV 
rapidly spread around the world, reaching all five continents and resulting in over 8096 
cases and 774 deaths by July 2003. The emergence of MERS-CoV in 2012 has 
resulted in the infection of 1800 people and 640 deaths as of today. Currently, there are 
no approved antiviral treatments or vaccines for any human coronavirus. 

Coronaviruses use homotrimers of the spike (S) glycoprotein to promote cell attachment 
and fusion of the viral and host membranes. As it is virtually the only antigen present at 
the virus surface, S is the main target of neutralizing antibodies during infection and a 
focus of vaccine design. 2 The coronavirus S is a class I viral fusion protein synthetized 
as a single chain precursor of -1300 amino acids which trimerizes upon folding. It 
comprises an N-terminal Si subunit containing the receptor-binding domain and a C- 
terminal S 2 subunit which is the membrane-anchored stalk carrying out membrane 
fusion. Cleavage by furin-like host proteases at the junction between Si and S 2 (S 2 
cleavage site) occurs during biogenesis for some coronaviruses such as murine 
hepatitis virus (MHV, the prototypical and best studied coronavirus). 3 ' 5 Coronavirus 
spike proteins have proven difficult targets for structural characterization and all 
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reported studies have provided atomic resolution data for only a few isolated domains. 6 ' 
13 The SARS-CoV S has also been studied in its native environment by cryo-electron 
microscopy (cryoEM) of intact virions, providing insights at low resolution into its overall 
shape. 14,15 However, the lack of high-resolution data for any coronavirus spike trimer 
until earlier this year had prevented a detailed analysis of the mechanisms associated 
with infection. 

Single-particle cryoEM is an increasingly important technique in structural biology, 
which enables the study of biological macromolecules in a near-native environment. 
Cryo-EM is undergoing a technological revolution due to the development of direct 
detection cameras and dedicated algorithms for tracking beam-induced motion and 
stage drift in recorded movies. 16 ' 21 These advances led to an explosion of the number of 
high-resolution structures determined using cryoEM worldwide for numerous proteins 
and protein complexes that had previously been intractable using other structural 
techniques. 

We leveraged these recent advances to determine the first near-atomic resolution 
structure of a coronavirus S glycoprotein trimer earlier this year 22 [Fig. 1(A)-(D)]. These 
results paved the way for understanding the mechanisms of infection of viruses 
responsible for outbreaks of deadly pneumonia such as SARS-CoV and MERS-CoV. 
This article provides an in depth analysis of the key methodological aspects that made 
possible the determination of the structure of the MHV S ectodomain trimer. We 
attribute this success to three main factors which are the design of a pre-fusion 
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stabilized construct, the strategy employed for cryoEM data collection and processing, 
and the availability of a recently developed de novo model building algorithm using 
Rosetta. 23 ’ 25 

Construct Design 

Viral fusion proteins adopt a metastable pre-fusion conformation at the virus surface 
until triggered to rearrange into a more stable post fusion conformation which promotes 
merger of viral and host membranes. 26 The significant magnitude of the conformational 
changes taking place during the fusion reaction could result in masking of epitopes 
initially accessible in the pre-fusion state and exposure of new epitopes specific to the 
post-fusion state. As a result, vaccine design initiatives aim at targeting the pre-fusion 
state of viral fusion proteins, which correspond to the conformation that could be 
detected by the immune system before infection. The intrinsic metastability of viral 
fusion proteins is usually associated with challenges to preserve the pre-fusion state 
during purification. This is illustrated by the case of the respiratory syncytial virus 
(paramyxovirus) F protein which required co-expression of the ectodomain (fused to a 
T4 foldon motif) with a pre-fusion specific Fab to enable isolation of this conformation. 27 ' 
29 X. 

During biogenesis, the MHV S protein is often naturally cleaved at the S 1 -S 2 junction (S 2 
cleavage site) by Golgi-resident furin(-like) proteases 3,30 [Fig. 2(A)] resulting in an 
increase in its fusogenic propensity. After cleavage, the Si and S 2 subunits remain non- 
covalently associated in the metastable pre-fusion S trimer. In the case of SARS-CoV 
and MERS-CoV, S 2 processing has also been suggested to promote subsequent 
cleavage at a second site located just upstream of the fusion peptide (S 2 ’ cleavage site) 
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to allow the fusion reaction to proceed upon virion uptake by a target cell. 4,5 We 
engineered a construct featuring a single amino acid substitution in the S 2 cleavage site 
to prevent furin processing and enhance the stability of the MHV S ectodomain pre¬ 
fusion structure. Substitution of an arginine residue present at position 717 by a serine 
residue at the site of cleavage (from RAHRJ, to RAHS) resulted in the purification of a 
homogeneous uncleaved protein product as confirmed by SDS-PAGE analysis [Fig. 
2(B)]. 

Although MHV S is known to oligomerize into homo-trimers upon translation in vivo, 
expression of the ectodomain yielded predominantly monomers, indicating that the 
transmembrane domain is required for trimerization and/or trimer stabilization. To 
promote oligomerization, an engineered trimerization motif based on the transcription 
factor GCN4 31,32 was C-terminally fused to the MHV S ectodomain in frame with the 
heptad repeat 2 (HR2) motif helix [Fig. 2(A)]. Biophysical analyses using analytical size 
exclusion chromatography coupled online to multi-angle light scattering 33 (SEC-MALS) 
as well as native mass spectrometry confirmed the trimeric organization of the GCN4 
stabilized MHV S ectodomain [Fig. 2(C)]. Proper folding of the purified MHV S 
ectodomain was confirmed by analyzing its binding affinity to the CEACAMIa 
ectodomain (the viral receptor) using microscale thermophoresis [Fig. 2(D)]. We 
determined a dissociation equilibrium constant of 48.5 +/- 3.8 nM which is in good 
agreement with the value of 21.4 +/- 4.2 nM reported by Peng et al 12 for the isolated 
receptor-binding domain. Imaging of this sample using negative staining EM further 
confirmed the homogeneity of the purified protein and suitability for high-resolution 
studies [Fig. 2(E)], 
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Cryo-EM data collection and processing 

Ice thickness has a strong influence on the final achievable resolution of single particle 
reconstructions. Ideally, the vitreous ice should be as thin as possible to still 
accommodate the particles of interest while maximizing Thon ring intensity at high 
spatial frequencies. 34 Imaging was completed on a Titan Krios 300kV microscope 
equipped with a Gatan K2 Summit direct electron detector operated in counting mode. 18 
Similarly to our previous work on the Thermoplasma acidophilum 20S proteasome, we 
initially sought to acquire data from holes having the thinnest possible vitreous ice. 35 
However, the MHV S protein clearly showed signs of denaturation when images where 
acquired in such conditions [Fig. 3(A)]. We interpret this observation as resulting from 
the surface tension exerted on the S trimers in thin vitreous ice. Hence, we targeted 
holes with slightly thicker ice than desired in which we could observe compact well- 
folded MHV S trimers, similar to what was observed using negative staining EM [Fig. 
3(B)], We collected a large dataset (1,600 micrographs) at high defocus (2.0-5.0 pm) to 
maximize the low-resolution contrast and our ability to align the particle images during 
subsequent processing. This example illustrates the fact that although it is not always 
possible to acquire data in the thinnest possible areas of ice (especially for fragile 
protein complexes), near-atomic resolution reconstructions can still be obtained by 
tailoring the imaging conditions appropriately. 

One of the major challenges encountered during processing of cryoEM data is the 
presence of multiple 3D structures in a given dataset. These differences can result from 
different conformations of the same protein, different chemical compositions due to loss 
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of one or several subunits of a protein complex, or (partial) denaturation of a fraction of 
the particles during purification or vitrification. If left untreated, this heterogeneity can 
limit the resolution and compromise the quality of the final map. 3D classification has 
emerged as an extraordinarily powerful tool to deal with structural heterogeneity in 
allowing to computationally isolate homogeneous subsets of the data. 36 ' 38 
We relied on extensive 2D and 3D classification using the Relion software 39,40 to deal 
with the marked structural heterogeneity of the MHV S ectodomain trimer dataset. We 
ran a first round of 3D classification without imposing symmetry to improve separation of 
“good” and “compromised” particle images. Figure 4 shows isosurface representation 
[Fig. 4(A)] and slices [Fig. 4(B)] going through the center of each of the 4 
reconstructions corresponding to the four classes requested during unsupervised 3D 
classification. Although looking at the classified maps did reveal differences between 
the different classes, looking at the aforementioned slices further confirmed the 
structural heterogeneity present in this dataset at a glance, as previously suggested by 
Scheres and colleagues. 19 At the resolution of our analysis, we could not identify distinct 
conformations of the MHV S trimer and postulated that the particles contributing to less- 
well resolved classes could be partially denatured. 

Comparison of the results of projection-matching refinements (using C3 symmetry) run 
before and after the aforementioned 3D classification step suggested that both 
reconstructions had similar resolution (4.4 A) according to the gold standard Fourier 
shell correlation (FSC 0 . 143 ) criterion [Fig. 5(A)]. The quality of the two maps, however, 
differed significantly as only the reconstruction computed after 3D classification showed 
features compatible with the resolution estimate [Fig. 5(B), 5(C)]. This case study 
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highlights that gold standard FSC measures internal consistency between two halves of 
the data 41,42 not resolution, and that the quality of the final map should always be in 
agreement with any numerical estimates of resolution. Starting from 1,200,000 particle 
images, we significantly reduced the size of the data set to 82,000 particles using 2D 
and 3D classification to generate the final 3D reconstruction at 4A resolution showing 
well resolved a-helices, (3-strands and amino acid side chains for a large part of the map 
[Fig 5(B), 5(C)], 

Model building 

Obtaining an atomic model of the MHV S glycoprotein required a hybrid approach 
combining docking of available crystal structures, de novo modeling using 
Rosetta 23,25,43,44 and Coot 45,46 and density-guided homology modeling using 
RosettaCM. 24 

The C-terminal S 2 subunit, which is the fusion machinery, is best defined in the density 
and was built using a combination of hand tracing with Coot and Rosetta de novo 
building. 25 The observation of large, bulky side chain densities, several disulfide bonds 
resolved in the map and of density putatively corresponding to glycans for several 
asparagine residues belonging to N-glycosylation sequons were used as internal 
controls during model building [Fig. 6(A), 6(B)], 

The density corresponding to the N-terminal receptor-binding Si subunit is not as well 
resolved as for the fusion machinery and features various levels of resolution in the 
reconstruction. The availability of two crystal structures for domain A 12,47 (including a 
structure of the MHV domain A) and of several crystal structures for domain B 10,11 was 
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of tremendous assistance and allowed us to directly dock these models into the 
reconstruction. RosettaCM was then used to rebuild the core (3-sheet of domain B and 
to derive a putative model (using density-guided homology modeling) for the disordered 
extension corresponding to the receptor-binding motifs in MERS-CoV and SARS-CoV. 
The quality of the map corresponding to domains C and D hampered manual sequence 
assignment for this region of the protein. Rosetta de novo 25 successfully identified a ~30 
residue-long fragment allowing to anchor the sequence register for domains C and D. 
The placement of several bulky side chains accounted for by the density and the 
identification of putative N-linked glycans suggested correct assignment, and allowed 
completion of the model [Fig. 6(C), 6(D)]. The density for the linker connecting the Si 
and S 2 subunits is poorly resolved and Rosetta de novo was used to generate a 
putative model of this region of the protein which should be analyzed cautiously, as 
suggested by the high B-factors associated with it. 

Discussion 

In addition to recent developments in direct detector technology, the determination of 
the first near-atomic resolution structure of a coronavirus spike glycoprotein trimer was 
made possible by (i) engineering a pre-fusion stabilized ectodomain construct, (ii) using 
extensive computational classification of particle images to sort out sample 
heterogeneity and (iii) relying on major advances in the Rosetta automated model 
building algorithm. To conclude, our results allowed the identification of a conserved 
neutralizing epitope at the surface of the protein and suggested potential vaccinology 
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strategies to elicit broadly neutralizing antibodies against coronaviruses. 22 This could 
pave the way toward the development of the first vaccine against human coronaviruses. 
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Figure Legends: 

Figure 1. 3D reconstruction of the MHV S trimer determined by single-particle 
cryoEM. A-B, 4.0 A resolution 3D map colored by protomer. Two different views of the 
S trimer (from the side (A) and from the top, looking toward the viral membrane. B) are 
shown. C-D, Ribbon diagrams showing the MHV S atomic model oriented and colored 
as in (A-B). 

Figure 2. Construct design and biophysical characterization. (A) Schematic of the 
construct used with each domain rendered with a different color. The S 2 proteolytic 
cleavage site (denoted with scissors) has been mutated to prevent furin processing. The 
construct also harbors a GCN4 trimerization motif fused in register with the HR2 helix at 
the C-terminal end of the MHV S encoding sequence. UH: upstream helix, FP: fusion 
peptide. (B) SDS-PAGE analysis of the secreted MHV S ectodomain confirms the lack 
of proteolytic processing during biogenesis. (C) Size exclusion chromatography coupled 
on line with multi-angle light scattering (SEC-MALS) showed the expressed protein 
forms a homotrimer. The deviation of the estimated molecular weight (463.2 +/- 0.3 
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kDa) from the theoretical one (418.9 kDa) likely corresponds to N-linked glycans. The 
red line represents the molecular mass (left axis, Da) while the blue line represents the 
normalized refractive index (right axis, arbitrary units). (D) The MHV S ectodomain 
binds with high affinity to soluble CEACAMIa (the host receptor). We determined a 
dissociation equilibrium constant of 48.5 +/- 3.8 nM using microscale thermophoresis 
suggesting the protein is properly folded. (E) Electron micrograph of negatively stained 
MHV S showing homogenous particles. Scale bar: lOOnm. 

Figure 3. Micrographs of MHV S particles embedded in vitreous ice. (A) Particles 
showed signs of denaturation in regions of very thin ice likely due to excessive surface 
tension. (B) Data acquired in regions featuring slightly thicker ice than desired showed 
compact and well-folded particles, similar to what was observed using negative staining. 
Scale bar: 50 nm. 

Figure 4. Computational sorting of particle images using 3D classification. (A) 

CryoEM reconstructions corresponding to the four classes requested during 
unsupervised 3D classification using the Relion software without symmetry imposed. 
(B) Slices through the center of the 3D reconstructions shown in (A). Only the left two 
classes were retained for further processing. 

Figure 5: Improvement of map quality at various stage of processing. (A) Gold 
standard Fourier shell correlation (FSC) curves of the initial 3D reconstruction obtained 
after 2D classification (pink), the 3D reconstruction obtained after the first round of 3D 
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classification (green) and the final reconstruction obtained after particle polishing and an 
additional round of 3D classification with local searches (blue). (B-C) Density 
corresponding to the upstream helix is shown alone (B) or with the corresponding 
atomic model (C) for the three aforementioned maps (using the same coloring scheme 
as in A) to illustrate the significant enhancement of map quality observed throughout 
processing. 

Figure 6: The synergy of hybrid modeling enabled to build an atomic model of the 
MHV S trimer. (A) An example of a disulfide bond rendered in green present in the S 2 
fusion machinery. The observation of numerous disulfide bonds resolved in the cryoEM 
reconstruction helped validate the register of the atomic model. (B) An example of a 
putative glycosylation site where additional density protrudes from the Asn 893 side 
chain. (C) Rosetta de novo placed a ~30 residue-long fragment that anchored the 
register of the model in the density for domains C and D which are characterized by 
weaker density than the central regions of the reconstruction. Bulky side chains are 
accounted for by the density and the map also shows additional density protruding from 
an asparagine residue corresponding to a putative glycosylation sequon. (D) An 
example of a putative glycosylation site where additional density protrudes from the Asn 
657 side chain. Arrows indicate cryoEM density corresponding to putative glycans. 
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Figure 1. 3D reconstruction of the MHV S trimer determined by single-particle cryoEM. A-B, 4.0 A resolution 
3D map colored by protomer. Two different views of the S trimer (from the side (A) and from the top, 
looking toward the viral membrane. B) are shown. C-D, Ribbon diagrams showing the MHV S atomic model 

oriented and colored as in (A-B). 


161x176mm (300 x 300 DPI) 


John Wiley & Sons 

This article is protected by copyright. All rights reserved. 






Protein Science 


Page 18 of 22 


Domain A 


SI Subunit 



Linker 


B 


S2 Subunit 


IU1 


■r 



C D, ,UH FP HR1 Central HR2 GCN4 

^ Helix 



S1-S2 


S2' 


B 


250 


130 


% % 


36 



z 

o 

3 

u 

n 

<0 

Q. 

73 
<D 
—* 

■ 

o 


3 

Q. 

O 

X 


850-n 


800- 


O 

o 


E 

>— 

o 

c 

LL 


750- 


700 


Kd = 48.5 ± 3.8 nM 



10 


-i 




io° io 1 io 2 io 3 io 4 

MHV S concentration (nM) 


10 ' 



Figure 2. Construct design and biophysical characterization. (A) Schematic of the construct used with each 
domain rendered with a different color. The S2 proteolytic cleavage site (denoted with scissors) has been 
mutated to prevent furin processing. The construct also harbors a GCN4 trimerization motif fused in register 
with the HR2 helix at the C-terminal end of the MHV S encoding sequence. UH: upstream helix, FP: fusion 
peptide. (B) SDS-PAGE analysis of the secreted MHV S ectodomain confirms the lack of proteolytic 
processing during biogenesis. (C) Size exclusion chromatography coupled on line with multi-angle light 
scattering (SEC-MALS) showed the expressed protein forms a homotrimer. The deviation of the estimated 
molecular weight (463.2 +/- 0.3 kDa) from the theoretical one (418.9 kDa) likely corresponds to N-linked 
glycans. The red line represents the molecular mass (left axis, Da) while the blue line represents the 
normalized refractive index (right axis, arbitrary units). (D) The MHV S ectodomain binds with high affinity 
to soluble CEACAMla (the host receptor). We determined a dissociation equilibrium constant of 48.5 +/- 3.8 
nM using microscale thermophoresis suggesting the protein is properly folded. (E) Electron micrograph of 

negatively stained MHV S showing homogenous particles. Scale bar: lOOnm. 
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Figure 3. Micrographs of MHV S particles embedded in vitreous ice. (A) Particles showed signs of 
denaturation in regions of very thin ice likely due to excessive surface tension. (B) Data acquired in regions 
featuring slightly thicker ice than desired showed compact and well-folded particles, similar to what was 

observed using negative staining. Scale bar: 50 nm. 
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Figure 4. Computational sorting of particle images using 3D classification. (A) CryoEM reconstructions 
corresponding to the four classes requested during unsupervised 3D classification using the Relion software 
without symmetry imposed. (B) Slices through the center of the 3D reconstructions shown in (A). Only the 

left two classes were retained for further processing. 
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Figure 5: Improvement of map quality at various stage of processing. (A) Gold standard Fourier shell 
correlation (FSC) curves of the initial 3D reconstruction obtained after 2D classification (pink), the 3D 
reconstruction obtained after the first round of 3D classification (green) and the final reconstruction obtained 
after particle polishing and an additional round of 3D classification with local searches (blue). (B-C) Density 
corresponding to the upstream helix is shown alone (B) or with the corresponding atomic model (C) for the 
three aforementioned maps (using the same coloring scheme as in A) to illustrate the significant 

enhancement of map quality observed throughout processing. 
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Figure 6: The synergy of hybrid modeling enabled to build an atomic model of the MHV S trimer. (A) An 
example of a disulfide bond rendered in green present in the S2 fusion machinery. The observation of 
numerous disulfide bonds resolved in the cryoEM reconstruction helped validate the register of the atomic 
model. (B) An example of a putative glycosylation site where additional density protrudes from the Asn 893 
side chain. (C) Rosetta de novo placed a ~30 residue-long fragment that anchored the register of the model 
in the density for domains C and D which are characterized by weaker density than the central regions of 
the reconstruction. Bulky side chains are accounted for by the density and the map also shows additional 
density protruding from an asparagine residue corresponding to a putative glycosylation sequon. (D) An 
example of a putative glycosylation site where additional density protrudes from the Asn 657 side chain. 

Arrows indicate cryoEM density corresponding to putative glycans. 
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